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Experimental monthly long-range forecasts for the United Kingdom 


Part III. Skill of the monthly forecasts 


By C.K. Folland, A. Woodcock and L.D. Varah 


(Meteorological Office, Bracknell) 


Summary 


Evidence is shown fora recent fairly sudden, though modest, improvement in the skill of the monthly forecasts, especially those 
of temperature extremes and rainfall. These forecasts are derived from forecast patterns of pressure at mean sea level (PMSL) 
made for three periods within the month. A first analysis is given of the skill of the derived monthly mean PMSL forecasts and 
preliminary analyses of the recent skill of the temperature and rainfall forecasts for several distinct periods within the month 
ahead, including the second half-month, to help indicate whether the forecasts have skill on the purely ‘long-range’ time-scale. 


1. Introduction 


This paper concentrates on the skill of the issued forecasts which are based on the contributions of all 
the forecasting techniques described in Folland and Woodcock (1986). (Folland and Colman (1986) 
provide a preliminary discussion of the skill of the most important statistical technique, the multivariate 
forecasting technique.) A discussion about the skill of the forecasts of pressure at mean sea level (PMSL), 
temperature and rainfall issued in the medium range (first 5 days of the month ahead), mid range 
(remainder of the first half-month) and long range (second half-month) since July 1982 is included. 


2. General remarks on the skill of the forecasts 


The forecasts of monthly mean temperature anomalies for each district are given in terms of the 
‘best-estimate’ of whether it will be very cold, cold, average, warm or very warm. In the long term the 
probability of each of these ‘quints’ is the same. Similarly, forecasts of percentage of average rainfall 
(hereafter referred to as rainfall percentage) are categorized according to whether it will be dry, average 
or wet, where each ‘terce’ is equally likely in the long term. A map of the ten districts for which forecasts 
are issued is given in Folland and Woodcock (1986). Until 1979 the forecasts were usually only made for 
the single most probable quint or terce category which is called the best-estimate forecast. The current 
system also forecasts the probabilities of each of the quints and terces, and is designed so that one of the 
quints or terces is forecast to have the highest probability and this is regarded as the best-estimate 
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forecast. Thus best-estimate forecasts can be continuously assessed from 1964 to date. Assessment of the 
skill of the complete set of probability forecasts is more difficult, though important, and will be tackled 
in a later paper. 

Since July 1982, a full record of best-estimate forecasts of temperature anomaly and rainfall 
percentage has been kept separately for the medium, mid and long ranges. In addition, forecasts of 
PMSL have also been analysed since July 1982 for six grid points near the United Kingdom (four of the 
points are shown in Folland and Woodcock (1986) with the two additional points at 55° N, 10° W and 
55° N, 00° W) though only the skill of forecasts for the month as a whole, and its first and second halves, 
is available at present. These objectively made assessments cannot, of course, be taken as complete 
measures of the value or utility of the long-range forecasts. However, the assessments of the skill of 
recent PMSL forecasts do at least start to tackle the problem of estimating the value of what used to be 
called ‘additional information’, i.e. the worded part of the forecast that describes the expected sequence 
of circulation types and the general characteristics of the associated weather. From a scientific point of 
view the assessments of the forecast PMSL patterns are therefore fundamental, since the temperature 
and rainfall forecasts, as well as the additional information, are now largely derived from these patterns. 


3. Skill of the temperature and rainfall forecasts 
(a) General problem of assessing long-range forecasts 

The assessment of the skill of forecasts is a very difficult matter. Each user of the forecasts has a 
different sensitivity to their content largely because their value (or ‘utility’) depends on the way the 
forecasts are phrased, their level of detail and on whether the user has an effective way of using the 
forecasts and monitoring their utility. The very act of receiving the forecasts, almost irrespective of their 
content, can be beneficial, as the background meteorological information supplied about district 
climatological averages etc. is of considerable potential value in its own right. For some users this 
information may be more useful than the low skill that the forecasts currently have in predicting 
deviations from these averages. On the other hand, some users may regard an individual forecast as 
essentially correct when an objective assessment shows little or even negative skill relative to a random 
chance level. For example, consider a forecast that predicts a change of circulation type, and a marked 
increase in temperature, in the second half of the month ahead. If the change actually occurs but was 
delayed for a few days after the beginning of the second half-month, a set of extremely cold observed 
temperatures at the beginning of the period may render the second half-month as a whole rather cold 
even though most of it was rather mild. Indeed, small timing errors of this kind may be imperceptible to 
many users interested in planning on the basis of the general nature of the weather over 2 weeks. This 
lack of sensitivity is itself influenced by the current non-availability of more detailed forecasts of good 
accuracy. Thus more detailed and accurate forecasts, if they become available, might appreciably 
influence the mode of operation of some users, quite apart from any increase in their number. 
Presumably, the occasional serious forecasting failures would then be much more damaging than is 
possible at the present time. So at this stage we have confined our assessments of forecasts to methods 
that help researchers into long-range forecasting to monitor their own performance. Every forecast 
assessment system is rooted in a need to create indices of the skill or information content of the forecasts 
relative (in most cases) to a measure of what is achievable by following some fixed, simpler strategy such 
as random chance forecasts (‘guesswork’), climatology, or forecasts of the persistence of some recently 
observed weather anomaly. 

The technique currently in most frequent use in the Synoptic Climatology Branch of the 
Meteorological Office for assessing UK long-range forecasts of temperature and rainfall of the best- 
estimate type is known as the Folland—Painting or FP system. As described in section (d), other 
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measures of skill are also found to be useful for assessing the PMSL forecasts. This is not surprising as 
any assessment system has scientific value if it provides self-consistent answers to well-posed questions 
about forecast performance. 

A final problem for all measures of forecast skill is the need to define, in an appropriate way, the 
climatological averages from which the anomalies are being forecast. Long-range forecasts are currently 
made in anomaly form but the climate continually fluctuates. In the United Kingdom this problem has 
been quite acute in recent decades (Gilchrist 1982, Folland, et al. 1985), especially for temperature 
averages in April and October. The traditional approach is to define ‘average’ over a recent period of 30 
years which is regarded as representative of the current climate. This convention has been adopted here 
but its limitations should be borne in mind. Its use has consistently led to an excessive number of 
observations of colder than normal conditions in the north-west of the United Kingdom since 1964, 
especially in the north and west of Scotland, possibly related to the cooling of the North Atlantic to the 
west of Scotland since about 1955, a cooling which has only recently ceased (Folland and Parker 1986). 


(b) The FP forecast assessment system 


The basic ideas underlying the system are described in Appendix 1. Table I shows the FP scoring 
tables for (a) temperature quints, (b) ‘grouped’ temperature quints (here the cold, average and warm 
quints are grouped together) and (c) rainfall terces. The quint and terce boundaries are defined 
separately in each district and for each of the overlapping 24 calendar monthly periods for which 
forecasts are made each year. Table II shows the ‘Sutcliffe’ scoring tables (Freeman 1966) which had 
previously been the most important assessment technique and which are still in limited use. 


Table (a). The Folland-Painting scoring table for assessing forecasts of temperature quints 


Forecast Observed Chance scores for 
forecast categories 
Very cold Average Very warm 
Ai A; As 


Very cold Fi $2 : —-2 . —2.6 
Cold F, 1.0 y —62 A —2.4 
Average F; —12 2.8 : he 
Warm F, —2.4 ~$2 R 1.0 
Very warm F; —26 iz E > Be 


Chance scores for 0 0 0 
observed categories 


Table I(b). As Table I(a) but for grouped temperature quints 


Forecast Observed Chance scores for 
forecast categories 
Very cold Cold to Very warm 
warm 
Ai A2 A; 


Very cold F; 8.5 =2. 
Cold towarm  F) 1. 
Very warm F; a 


Chance scores for 0 
observed categories 





Table I(c). 


Table II(a). 


Forecast 


Very cold F, 
Cold F, 
Average F; 
Warm Fy 
Very warm Fs 
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As Table ia) but for rainfall terces 


Forecast Observed Chance scores for 
forecast categories 
Average 
A2 


Dry F, ‘ <3 
Average F, 2.6 
Wet F; al 


Chance scores for 0 
observed categories 


The Sutcliffe scoring table for assessing forecasts of temperature quints 


Observed Chance scores for 


forecast categories 
Very cold Average Very warm 


Ai 2 A; As 


4 —4 
I —4 
2 —J 
— 
—q 


Chance scores for ——2 
observed categories 


Table II(b). 


Tables I(a) 


conditions th 


As Table II(a) but for rainfall terces 


Forecast Observed Chance scores for 
forecast categories 
Average 
A? 


Dry F; 0 
Average F; 4 
Wet F; 0 


Chance scores for 1.33 
observed categories 


—I(c) are designed so that the chance score is always zero no matter what the observed 
(outcome) category. By contrast, the Sutcliffe tables have a positive chance score for average or 
near-average outcome categories (i.e. quints 2, 3 and 4) and a negative chance score for extreme outcome 
categories. This structure tends to give an artificially (slightly) higher score during a run of near-average 


an during a run of extremes. For individual forecasts, a more serious problem with the 


Sutcliffe tables can be seen when comparing Table II(a) with Table I(a). Consider a forecast of quint 5 


(Fs) followed 
of quint 3 (F; 


by an outcome of quint 3 (A3); the Sutcliffe system gives a score of 0 points. For a forecast 
) followed by an observation of quint 5 (As), the score is now —3 points. The FP system 
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gives —1.2 points in both situations; this seems more logical as the ‘error’ in both situations is the same. 
Another difference between the tables is that a correct forecast of a quint or terce category gains the same 
maximum score in the Sutcliffe system no matter what the category is; in the FP system correct forecasts 
of extreme categories have a higher score than correct forecasts of a near-average category. 

It should be noted that the values of the quint and terce boundaries and the estimates of observed 
anomalies in each district have been recalculated back to 1964 and therefore differ from the values 
published in the old Monthly Weather Survey and Prospects during the period of public issue. This 
development has been made possible by the construction of a new district-average climate data base that 
commences in January 1951 and which is regularly updated jn an automatic way using quality- 
controlled data files created by the Advisory Services Branch. These data were still found to be 
inadequate in amount to revise satisfactorily the (previously rather uneven) rainfall terce boundaries. So 
it was decided to use the very comprehensive automated statistical model of the climate of extreme 
rainfall totals, calculated for the United Kingdom on monthly (and longer) time-scales by Tabony (1977) 
and converted into a regular grid-point format by Colgate (personal communication). The result is to 
provide quint and terce boundaries at fixed values of temperature anomaly or rainfall percentage that 
vary smocthly through the calendar year and between adjacent districts. The averages about which the 
anomalies are calculated (for a fixed set of stations) vary according to the epoch when the forecast was 
made.* 


(c) Assessments of monthly forecasts for 1964—86 
Table III (a) shows the annual mean FP skill scores for best-estimate monthly forecasts that were 
issued from 1964 to 1985. Table III(b) shows the equivalent results from the Sutcliffe system for 


comparison with previous papers, e.g. Ratcliffe (1970), Jenkinson (1975) and Hardy (1980). For both 
systems the skill, SK, of a given forecast is derived from the scores, Su, for individual districts as follows: 


for {Ss >0 


¥ ax 
=Sa™ 


rSa ’ 
100 ———— for {Sa <0 
|S" | 


max 


where S,""* and S,""" are the maximum and minimum possible scores (e.g. Sa" is the score that would 
be obtained if the outcome was correctly forecast). Thus SK varies between + 100%. For example, if the 
observed and forecast quints for two districts are (As, Fs) and (Aa, F2), then their combined skill score is 
—16%(see Table I(a)). If an annual mean value of skill is required, the sum of the scores, = Su, is taken 
over all ten districts and 24 overlapping forecast months in a year (starting in January and finishing in 
mid-December to mid-January). SK can, of course, be calculated over any period and choice of districts. 





* The averaging periods used to calculate district mean anomalies for use with Tables I(a)-I(c) are as follows: 


Period of forecasts ‘Temperature Rainfall 
1964-75 1931-60 1941-70 
1976-80 1941-70 1941-70 
1981-86 1951-80 1951-80 


The 1931-60 temperature averages for the ten districts are estimates based on changes in Central England Temperature between 
1931-60 and 1941-70. 





382 Meteorological Magazine, 115, 1986 


Table III(a). Annual skill scores (percentages) of issued forecasts for 1964-85 compared with 
persistence, for all districts, using the Folland-Painting system (I = issued and P = persistence 
forecasts) 


Temperature Temperature Rainfall 
(quints) (grouped quints) (terces) 
r I F 


I 
1964 1 —2 =< —6 12 
1965 18 29 —4 29 =. 
1966 12 —— 8 —5 =—2 
1967 5 16 12 9 
1968 5 —14 6 —14 
1969 23 16 7 17 


1970 20 13 A 13 
1971 —14 11 <q 
1972 1 Il 5 11 
1973 — 1 5 —*— 
1974 7 23 <2 14 
1975 9 =F 10 7 
1976 7 25 3 34 
1977 a 0 =f —2 
1978 —I5 — —15 
1979 13 23 17 30 
1980 <2 —25 i 7 
1981 14 =—16 A —4 
1982 =i <7 oe —18 
1983 15 17 24 25 
1984 —8 17 0 
1985 13 =3 15 “se 


1964-85 
mean 


Twice the 
standard 
error* 4 6 3 6 4 4 


* Assuming each year is independent of the next, calculated from twice the standard error of the underlying annual scores, only 
then converted to skill. 

Table III also shows skill scores for forecasts based on the use of persistence, i.e. a forecast of the same 
quint or terce category as was observed in the most recently observed non-overlapping month-long 
period. There is clearly a large variability in skill as well as a small overall positive ‘bias’ in the Sutcliffe 
quint skill values (because slightly more positively biased quint 2, 3 and 4 categories were observed 
between 1964 and 1985 than the chance expectation of 60%).Over the whole period, issued forecasts 
averaged over all districts have statistically significant annual skill while persistence forecasts show 
significant annual skill for grouped quints only. 

Figs 1-3 show 4-year running-mean graphs of skill calculated from the FP system, where skill has 
been updated for each 2-month natural season. The diagrams show that it is almost certain that real 
variations have occurred in the skill of both issued and persistence forecasts. (The FP scores have been 
summed over running 4-year periods before converting them to skill.) Fig. | indicates an appreciable 
variation in skill of forecasts of the persistence of temperature quints (persistence from the previous 
month), a peak in skill of issued temperature forecasts in the 1960s and a recent sharp, though modest, 
recovery of the skill of issued temperature and rainfall forecasts from low values in much of the 1970s. 
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Table III(b). As Table III(a) but using the Sutcliffe system 


Temperature Rainfall 
(quints) (terces) 
I P I P 


1964 5 =e 21 
1965 26 32 2 
1966 14 ~—_ =—§ 
1967 Il 25 
1968 10 —_ 
1969 24 10 


1970 25 14 
1971 —14 
1972 o 16 
1973 —_ 9 
1974 6 24 
1975 12 —8 
1976 8 29 
1977 —8 —4 
1978 —f7 
1979 12 24 


1980 2 <a 
1981 15 —20 
1982 —- 2 
1983 15 16 
1984 18 —- 
1985 17 — 


1964-85 
mean 


Twice the 
standard 
error 


Fig. 2 (grouped quint assessments) places more emphasis on extremes (quints | and 5) being correct; 
until recently, forecasts categorized into grouped quints showed less skill than did those categorized into 
quints. 

Fig. 3 indicates that the rainfall forecasts have increased in skill to the same extent as those for 
temperature both in absolute skill and relative to persistence. In fact the skill of the three-category 
forecasts of rainfall (terces) and temperature (grouped quints) changes in a rather similar way 
throughout 1964-86. Fig. 2 suggests that a rather sudden marked improvement in predictions of 
extreme temperature occurred quite recently. Fig. 4(a) throws light on this result. It shows the 4-year 
running mean of the ratio of the number of quints | and 5 observed to the number forecast, irrespective 
of whether the forecasts were correct. This ratio increased suddenly from a near-constant value of about 
0.2 between 1964 and 1980 to over 0.8 in 1982-85, i.e. the ‘boldness’ of the best-estimate temperature 
forecasts has recently quadrupled. Despite the increased boldness in 1982-85, the probability that a 
forecast of an extreme quint was correct was about the same as in 1964-80. The net result was a four-fold 
increase in the number of correct forecasts of quints | and 5 (Fig. 4(b)). This is encouraging, as skilful 
forecasts of extremes are probably of most use to customers. Note that the problem of lack of boldness in 
the forecasts and the need to tackle it was well recognized in the period of public issue, especially by 
Jenkinson (personal communication). 
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Figure |. Four-year running mean skill of issued (——) and persistence (— — —) forecasts of temperature quints, based on the 


Folland-Painting system, plotted every two months. The last point plotted is for season 4 of 1982 to season 3 of 1986. 





T T \-4," 
78.00 60.00 62.001 
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Figure 2. As for Fig. | but for grouped temperature quints. 
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Figure 3. As for Fig. | but for rainfall terces. 


There is sti a strong tendency to forecast too many occasions of near-average rainfall (i.e. terce 2). 
Thus in 1982-85 the number of forecasts of terces | and 3 was only about 70% of the number observed — 
a percentage which was only a little more than in the period of public issue. Despite this, the likelihood 
that a forecast of terce | or terce 3 is correct appears to have increased (Fig. 5). Note that the chance 
percentage of such forecasts that are correct varies (mostly) with the number of terces | and 3 observed, 
though in the long run the chance percentage will be very near 33.3%. 

Figs 6(a)—6(c) show how the skill of the issued forecasts has varied over individual districts. Four-year 
running-mean skill scores for all ten districts are shown, but for clarity only two are identified; district 5 
(south-east and central southern England) and district I (eastern Scotland). District 5 has tended to have 
the least successful forecasts, especially in the 1970s. District 1 has generally more successful forecasts 
especially in the 1970s. Recently the skill scores have tended to vary less between the districts. The 
differences in the 1970s are important as Nap et al. (1981) and Baker (1982) draw rather over-pessimistic 
conclusions about the performance of UK long-range forecasts as a whole by analysing data only from 
district 5 during the 1970s. 

The skill of monthly long-range forecasts also tends to vary with season (Table IV). Over the whole 
period 1964-86, the variations are not quite statistically significant, but they are appreciable. The 
correlation between the seasonal mean values of skill for temperature (quints) and rainfall (terces) is 
0.85, which is statistically significant at the 5% level. It is too early to draw firm conclusions about recent 
trends in the pattern of seasonal forecast skill. So far, though, forecasts for the traditionally most skilful 
natural seasons (summer and winter) have contributed most to recent increases in skill, especially 
forecasts of temperature in winter and rainfall in summer. Over the last few years forecasts in spring have 
been least successful. 
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Figure 4. (a) Percentage of quints | and 5 forecast compared with number of quints | and 5 observed and (b) the percentage of 
observed quints | and 5 which were correctly forecast: individual years (*), four-year running mean (——), four-year running 
mean of chance percentage of correct forecasts of quints | and 5, given the number of quints | and 5 forecast (— —-). 


(d) PMSL forecasts for six grid points near the United Kingdom 


For each month, PMSL forecasts for the constituent medium-, mid- and long-range periods have 
been appropriately averaged to provide forecasts of mean monthly and half-monthly PMSL and PMSL 
anomalies (from a 1951-70 average) at each of the six grid points. Because the period available for 


testing is brief (the 4 years from July 1982 to mid-June/ mid-July 1986) only a short summary of the 
results is given (Table V). 
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%é .00 68.00 70.00 72 .00 7% .00 7.00 78.00 80.00 62.00 8% .00 86 .00 


Year 


Figure 5. Four-year running mean of percentage of forecasts of rainfall terces | and 3 that were correct (—) and that expected 
by chance to be correct given the number observed (— — -). 


The skill of the PMSL anomaly forecasts has been categorized according to the confidence levels C, D 
or E that accompany each forecast (the confidence scale runs from A (highest) to E (lowest), but only the 
range C to E is currently used) to provide an indication of whether these expressions of confidence made 
when the forecast is issued are meaningful. Four measures of skill are shown: 

(i) A measure of the skill of the PMSL anomaly forecasts in predicting correctly the observed sign of 
the PMSL anomalies. This is called the ‘sign skill’ and is defined by: 


(N-Ni) 


SS = T 


where NV. and N; are the number of correctly and incorrectly forecast grid points respectively, and Tis the 
total number of grid points forecast. This index has a value of zero when the numbers of correct and 
incorrect forecasts are the same. SS provides the same measures of skill as does the FP system when 
applied to two equi-probable observed and forecast categories (climatological probability 0.5 for each 
category). 

(ii) The root-mean-square error of the PMSL anomaly forecasts, (RMS)r in millibars. 

(iii) The root-mean-square error of forecasts assuming persistence of the PMSL anomaly observed in 
the previous month, (RMS)p in millibars. 

(iv) The mean correlation, ra between the forecast and observed PMSL anomalies. This is calculated 
as the grand average of correlations calculated for individual monthly forecasts (using Fisher’s z 
transformation to help calculate the average (e.g. Snedecor and Cochran 1973)). 

It appears that there is significantly more skill in the forecasts overall when the forecasters have most 
confidence (confidence C) in them at the time of issue. It is encouraging that the PMSL forecasts, 
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Figure6. Four-year running mean skill of forecasts for all ten districts of (a) temperature quints , (b) grouped temperature quints 
and (c) rainfall terces. Districts | and 5 are highlighted. 
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Figure 6 continued. 


Table IV Seasonal skill scores (percentages) of issued forecasts between 1964 and pre-summer 1986 
compared with persistences, for all districts, using the Folland-Painting system 


Natural seasons No. of forecasts* Skill Persistence 
(two months) skill 


(a) Temperature (quints) 


Winter (Jan., Feb.) 
Spring 
Pre-summer 
Summer 

Autumn 
Pre-winter 


(b) Temperature (grouped quints) 


Winter 
Spring 
Pre-summer 
Summer 
Autumn 
Pre-winter 


(c) Rainfall (terces) 


Winter 92 4 
Spring 92 =~ 
Pre-summer 92 1 
Summer 88 12 
Autumn 88 5 
Pre-winter 88 <3 


* There are four overlapping monthly forecasts in each season in each year and a forecast for the ten districts in a given month is 
counted as one forecast. 
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Table V. Summary of various skills for different confidences of forecast during recent years (see text 
for explanation of notation) 


Confidence Twice Significance* 
standard 
error 
Cc D 


(a) Monthly mean PMSL forecasts 


No. of forecasts 17 46 33 96 
SS 75 23 16 30 
(RMS)r (mb) 4.31 6.21 6.32 5.96 
(RMS)p (mb) 6.82 8.21 8.79 8.19 
(RMS)r/(RMS)p 0.63 0.76 0.72 0.73 
ra 0.34 0.16 0.27 0.23 0.17 


(b) Monthly temperature and rainfall — Folland—Painting skill scores 


Temperature 29 3 
(quints) 

Temperature 23 6 
(grouped quints) 

Rainfall 40 15 
(terces) 


* Using an analysis of variance on the three categories, and assuming N/2 independent monthly forecasts, where N = number of 
forecasts. 


L 


Note: (RMS)c, the root-mean-square error of a forecast of climatology, is about 0.75 X (RMS)p in principle, so is harder to 
improve on than persistence. However, the values of SS and ra for climatology forecasts are in principle zero, illustrating the 
difficulty of scoring long-range forecasts expressed in ordinary scientific units. 


Twice the standard error is calculated assuming N/2 independent monthly forecasts. 


especially, hint at this relationship. Table V indicates, therefore, that an overall measure of consistency 
exists between the skill of the different elements of the forecasts and the quality of the evidence used to 
construct them. This is perhaps one of the best pieces of scientific evidence so far available that 
long-range forecasting can be done at all, that predictability does vary and that the user, to a limited 
extent, can decide on which forecasts to place more reliance. However, the user is most likely to benefit 
from applying this knowledge over an extended period. 


4. Variation of skill throughout the monthly forecast 


High skill in the medium range (days 1-5) will clearly tend to raise the skill of the forecasts averaged 
over the month as a whole. So a false impression could be gained of trends in skill in the truly long range 
from changes in monthly average skill alone. Table VI gives a preliminary indication of the variation of 
sign skill, SS, for different periods within the monthly forecasts since the data were first available in a 
homogeneous form (July 1982). SS is applied to best-estimate forecasts of temperature anomaly and 
rainfall percentage on the medium, mid (day 6-mid-month) and long (second half-month) ranges and to 
PMSL forecasts for the two individual half-months and for the whole month. SS is a very basic measure 
of skill, being based on only two categories, and will generally give larger values of skill than a more 
searching terce or quint scheme. It is adequate to show, though, whether skill exists at all. 

To increase the number of forecasts available for assessment, the forecasts for each district are used. 
However, adjustments have then been made to the nominal number of district forecasts to allow for their 
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lack of statistical independence. These adjustments are made separately for temperature and rainfall 
forecasts and allow for the correlation of observed district-averaged anomalies (a) in space (between 
districts) and (b) in time (due to the persistence and, sometimes, overlap of observed conditions in 
successive forecast periods); the details are described in Appendix 2. The procedure entails the 
introduction of the factor f;, whose form is derived in Appendix 2, which is used to reduce the apparent 
number of forecasts summed over all districts; f; is shown in Table VI for each forecast period i within the 
month. 


Table VI. Sign skill (SS) of forecasts for periods within a month for the period July 1982 to 
mid-June/ mid-July 1986 (see text for explanation of notation) 


Medium Mid range First half- Second half- Mid and Whole 
range month month long ranges month 


SS 
(b) Temperature 


SS 
Significance 


Si 
(c) Rainfall 


SS 35 
Significance 5X 10°% 
Si 0.117 


DiN; 960 
* No significance tests available. 


Note: The percentage of forecasts having the correct signs of their anomalies is given by 50 + 0.5 SS. 


Table VI shows that, on the monthly time-scale, the number of statistically independent district 
forecasts is typically only about 10-15% of the number issued . This number (for a month) is 
considerably less, for example, than that cautiously assumed by Hardy (1980). Estimates of the 
statistical significance of SS are based on two tests. Firstly a x” test (Snedecor and Cochran 1973) is used 
to indicate whether the tendency to forecast correctly the sign of the observed anomalies is statistically 
significant after allowing for the actual number of observations and forecasts, of both signs, of the 
anomaly. For rainfall, the two categories of opposite sign are above and below 100% of average rainfall © 
respectively. Allowance was also made for a marked tendency to forecast exactly 100% of average 
rainfall or zero anomaly of temperature in the mid range or long range. These are regarded, in principle, 
as neither correct nor incorrect. 

A second test, based on the binomial distribution, is used to show whether the observed fraction of 
forecasts having the correct sign of the anomaly is significantly larger than the chance value, which is 
assumed to be 0.5. Both tests give similar results and only their average indication of statistical 
significance is reported. 

The following important conclusions can be deduced from Table VI: 

(i) There can be no doubt that the forecasts have appreciable skill in the medium range (days |-5S of 
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the forecast which are in practice usually days 2-6 or 3-7 ahead). The skill value of 43% observed for 
temperature implies that over 71% of the district temperature forecasts had the correct sign between July 
1982 and mid-June/ mid-July 1986 (since January 1985 the number with correct sign has averaged nearly 
80%). 

(ii) The rainfall forecasts tend to be better than the temperature forecasts except in the medium range: 
Table VI has the advantage of providing the same, if very basic, measure of skill for both parameters so 
that a direct comparison of skill is possible. The extra skill of the rainfall forecasts is clearer in the second 
half-month ahead (long range), when it is apparently statistically significant. However since summer 
1985 the temperature forecasts have been more skilful, possibly related to the increased use of 
information about sea surface temperature anomalies near the UK coast (Folland and Woodcock 
1986). 

(iii) The average structure of sign skill through the monthly forecasts is unexpectedly complex. Thus 
the skill of the rainfall forecasts on the monthly time-scale is unexpectedly large (25%) when compared 
with the medium-range time-scale (35%) and is unexpectedly small (smallest) in the mid range (13%). 
However, mid-range skill has apparently improved over the last year. There is a marked overall 
tendency for the skill averaged over a longer forecast period to be larger than its weighted average over 
constituent shorter periods. This is a regular feature even in tables (not shown) for individual years 
constructed in the same way as Table VI, despite large interannual variations in other details of the skill. 
This tendency may result from timing errors in the forecasts which are likely to reduce skill more 
strongly on shorter time-scales than on longer ones — a feature worth closer scrutiny since it could affect 
the perception of ‘predictability’ and the design of future forecasting systems. 


5. Conclusions 


Despite a substantial reduction in staff effort, the use of a small number of improved forecasting 
techniques and the creation of a more structured forecasting procedure appear to have recently resulted 
in a modest improvement in skill. Skill of course is still low and it remains to be seen whether the 
improvement can be maintained; past history demonstrates that fluctuations in skill are almost 
inevitable in the future. It is hoped that current efforts to (a) introduce regular dynamical forecasts in the 
longer ranges, (b) to intensify research into the dynamics and statistical description of low-frequency 
weather variability and (c) to exploit information contained in world-wide sea surface temperature 
anomaly patterns, may allow further slow improvements in technique and performance to take place. 
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The FP scoring system for long-range forecasts 


The fundamental basis for the FP system arose from an unpublished suggestion by Kirk around 1970 


that information theory might provide a more flexible and satisfactory approach to assessing long-range 
forecasts. This is currently a matter of debate (e.g. Daan 1985). The initial development of Kirk’s ideas 
was carried out by Painting (personal communication) in 1975. The FP system can provide a variety of 
diagnostics about forecast performance (both best-estimate and probability forecasts). Here attention is 
concentrated on deriving the Tables I(a)—I(c) used to estimate the skill of best-éstimate forecasts. 

Fig. Al shows a hypothetical probability distribution of, say, monthly mean temperature in a given 
district and calendar month derived from many years of historical data. A best-estimate forecast, Xr, is 
made in one of the categories shown (which need not have the same size) and the category in which 
the verifying observation, Xa, falls is noted. The ‘distance’ between X¢ and X4q is defined as the area, Sra, 
under that part of the probability curve that lies between, and includes, the categories into which Xr and 
X fall. The information content of the forecast is then defined as: 


(Al.1) 


I = —log. (Sra). ... 


This definition is related to the idea of the ‘self information’ of an event in information theory (Jones 
1979). It is possible to calculate / for all combinations of forecast and observed values to provide a table 
of information values. Table AI shows the resulting information values J (i= 1 to 5, j= 1 to 5) for 
(equi-probable) forecasts and observations of quints. Note that /=0 for a forecast of quint 5 and an 
observation of quint | which, quite naturally, has no information content. 
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Frequency of occurrence 








a 


Category (i) 


Figure Al. The basis of the Folland—Painting system using a probability curve. Xr and Xq are respectively the forecast and 
observed values for a variable X, and Sra is the area under the probability curve that lies between, and includes, the categories 
into which Xr and Xa fall. The five categories (i) into which X¢ and Xq fall do not necessarily have equal areas under the 
probability curve. 


Table AI. Jnformation scores for forecast and observed quints using the Folland—Painting system 


Forecast Observed 
2 4 


1.61 0.92 \ 0.22 0 

0.92 1.61 x 0.51 0.22 
0.51 0.92 é 0.92 0.51 
0.22 0.51 : 1.61 0.92 
0 0.22 ‘ 0.92 1.61 


It is desirable for scientific purposes to provide a set of information values /’; whose average would be 
zero if the forecasts were unrelated statistically to the observations. This can be done, e.g. for a set of 
quint categories, by making a long series of forecasts where the quint categories of successive forecasts 
are chosen using a random number generator which operates on a uniform distribution of numbers in 
the range | to 5. The expected result of this operation can be achieved using a standard mathematical 
result: 


5 
ij/j-4ij- > Pili 
j=1 


1 





rj=h—-- (A1.2) 


5 

> Py. 
i=1j=1 
where P; is the chance probability of an observation of category iand a forecast of category /, Pi is the 
chance probability of category ij happening given that the forecast category j occurred etc. We note also 
that in the quint table: 


(A1.3) 
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The resulting values, /’;, are called the ‘effective information gain’ values above a (random) chance level. 
To help comparison with the older Sutcliffe scoring system, we normalize the effective information gain 
values to a new set, Sj, so that the long-term average score for a correct forecast is 4 (allowing for the 
long-term probabilities of each correct category; in a quint scheme these are 0.2 for each correct 
category). Tables I(a)-I(c) result from appropriate applications of this procedure, and assume that the 
forecasts and observations of each category are made with their expected probabilities (i.e. 0.2 for 
forecasts and observations of quints and 0.333 for those of terces). In recent years these assumptions 
have been reasonably acceptable for quints but before 1981 the number of forecasts of extreme 
temperature (quints ! and 5) was consistently much too low, only about 1/5 of the number observed (see 
also section 3(b)). The problem will be discussed in a future paper; it is enough to note that the FP system 
can automatically be adjusted (via equation (A1.2)) to deal with this problem. Flood and Weller (1969) 
provide an interesting discussion of the possible consequences for skill when the Sutcliffe scoring system 
is not adjusted to allow for an insufficient number of forecasts of extremes. 


Appendix 2 — Calculation of the equivalent number of independent district forecasts 


Let there be N; forecasts for a given period i within the month (including the month itself) made over a 
period of time for Dj districts. The total of D; times Nj forecasts is modified to an equivalent number of 
independent forecasts N’; given by: 


Ni = aDibiN, = SiDiNi 


where /; = ab; and b; is the reducing factor that estimates the effective number of independent forecasts 
made through time for a given district due (mainly) to the persistence of observed anomalies between 
successive forecasts, and a; is the reducing factor that estimates the effective number of independent 
districts mainly because of the high spatial correlation of observed anomalies between districts. Thus 
a;<1 and b; <1. The high spatial correlation of temperature and rainfall anomalies means that the 
number of truly independent districts is much less than the ten for which forecasts are made. Using data 
for the period July 1982 to mid-June/ mid-July 1986, estimates of a; were made using the formula 
(Yevjevich 1972): 


= Dj 


aj 


where r ; is the average correlation of the observed anomalies in each district with those in every other 
district and K; is a complex factor that allows for the variation in the standard deviation of the 
temperature anomalies or rainfall percentages between districts and between different months of the 
year and has a value of a little below unity. The value of a; varies only slowly with the choice of forecast 
period i and averages rather over 0.1 for temperature and about 0.15 for rainfall. The value of b; has been 
estimated from the following approximate expression adapted from Yevjevich (1972): 


Ni 
14+2(r 17 + ro? + 737) 





a= 


where r ;; is the average correlation (over all districts) of observed anomalies (for given forecast period 
length i) in successive forecast periods, r 2; is the average correlation of observed anomalies with those in 
the next but one forecast period, etc. Values of r4; and beyond were insignificantly different from zero 
for all lengths of forecast period within the month and so were not used. 
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Snow forecasts from the Meteorological Office fine-mesh model during 
the winter of 1985/86 


By T. Davies and Olive Hammon 
(Meteorological Office, Bracknell) 


Summary 


The performance of the Meteorological Office fine-mesh model in forecasting snow during the winter of 1985/86 is examined. 
The snow predictor currently used in the model is compared with other possible predictors to see whether an alternative predictor 
could provide more precise guidance. 


1. Introduction 


The fine-mesh model is one of the important sources of guidance for forecasters in the United 
Kingdom and, in his assessment of the current state of short-range weather forecasting, Woodroffe 
(1984) states that the model is the best tool at the forecaster’s disposal for forecasting snow 24 hours or so 
ahead. A basic description of the fine-mesh model can be found in an article by Gadd (1985). 

The forecasting of snow is a two-stage process. The first stage is to decide whether or not precipitation 
is likely, and the second stage is to forecast the temperature structure of the near-surface layer of the 
atmosphere. This second stage compounds the forecasting problem since small temperature errors may 
imply the wrong form of precipitation. The main purpose of this paper is to examine the performance of 
the fine-mesh model with regard to the second stage, but firstly, a few remarks are required as to the 
quality of the model’s precipitation forecasts. _ 

The fine-mesh model is a grid-point model with a grid length over the United Kingdom of about 
75 km. As a result of approximations used in the modelling process, accurate detail cannot be achieved 
on a scale below one or two grid lengths, about 100 km. The enhancement of precipitation due to 
orographic effects and local convection cannot therefore be simulated realistically by the model. 

Errors of about one grid length in forecasting precipitation are often unimportant as the movement of 
systems makes them appear as minor timing errors. However, in slow-moving or quasi-stationary 
situations, such an error can imply the wrong character of weather for a whole region, but even with this 
degradation, the fine-mesh model has proved to be very useful in assessing the general distribution and 
amounts of rain. Up to the winter of 1985/86 there was no regular objective verification of precipitation 
forecasts from the model, though since 1962 a partially objective statistic has been produced in the 
Central Forecasting Office (CFO) at Bracknell based on the forecast of precipitation for London made 
by the senior forecaster (Woodroffe 1984, Flood 1985). This statistic shows that the skill of the forecast 
does appear to have improved in recent years and this improvement is mainly attributed to the increased 
accuracy of fine-mesh model guidance. 


2. Snow prediction 


Whether snow melts or not before reaching the ground depends on the temperature and humidity near 
the surface and the rate of precipitation. Forecasting the low-level atmospheric structure is difficult and 
to overcome this, attempts have been made to identify snow predictors which can be forecast more 
readily. For example Boyden (1964) examined a number of predictors giving the probability of snow 
and recommended the use of the 1000-850 mb thickness corrected for mean-sea-level pressure (MSL) by 
adding a factor (MSL—1000)/4; this predictor will be referred to as 1000-850 P henceforth. A further 
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correction for station height is also required (arrived at by subtracting station height in metres divided 
by 30). 

The 1000-850 P has become one of the most widely used predictors chiefly because it is usually not too 
difficult to predict the 1000-850 mb thickness. With the advent of the 10-level model, and in particular 
the limited-area version (the rectangle), it was appreciated that the model forecast of the 1000-850 mb 
thickness was reliable enough for the predictors to be displayed as part of the model output. Therefore 
the 1000-850 P was used to show the probability of snow occurring by adding lines of snow probability 
(80%, 50% and 20%) to the form of output used by the forecaster. The use of the snow-probability lines 
based on 1000-850 P has continued with the introduction of the higher-resolution fine-mesh model. The 
snow-probability lines are mean-sea-level values which need to be adjusted to suit local terrain. However 
the fine-mesh model orography is too smooth (because the grid length is insufficient to resolve local 
detail) to make realistic adjustments to the snow probability at each grid point. Fig. 1 shows the 
orography currently used by the fine-mesh. There are many places where the model orography differs 
substantially from reality. Note for example, the absence of valleys. 

















Figure 1. Part of the operational fine-mesh orography. Lines drawn every 50 metres. 


Forecasters in CFO have recognized for some time that in the case of precipitation ahead of active 
warm fronts, the 1000-850 P predictor underestimates the probability of snow. The 20% snow-line in 
such instances is usually interpreted as defining a 50% actual probability of snow (Hunt 1985). The 
discrepancy is not a result of model errors in forecasting the 1000-850 mb thickness. Indeed, since the 
1000-850 mb thickness of 1300 gpm (corresponding to a 20% probability of snow when MSL is 
1000 mb) is considered to be important, the forecaster in CFO has forecast charts of 1000-850 mb 
thickness available so that they can be examined in marginal situations and used for forecasting snow 
over high ground. 

Boyden pointed out possible reasons for the failure on some occasions of 1000-850 P as a predictor. 
Firstly, the layer of air above the freezing level contributes to the thickness but is not relevant to the form 
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of precipitation. Secondly, the 1000-850 mb thickness is relatively insensitive to the lowering of the 
freezing level caused by melting snow. Both factors taken together are important when considering 
active warm fronts, particularly in situations where an inversion just above the surface is undercut by 
very cold air. 

The efficacy of 1000-850 P as a predictor is partly due to the experience and skill of the user. However, 
major snowfall events are often finely balanced, particularly in southern Britain on occasions when 
warm air from the south-west comes up against very cold and dry continental air. When using 
1000-850 P, many forecasters compensate for warm fronts as well as attempting to assess the likely 
structure of the low-level air by studying upstream dew-points and temperatures. 

In practice a single-valued predictor would be more helpful. If a representative upper-air ascent is 
available or the low-level structure can be forecast, many forecasters choose to use the wet-bulb freezing 
level as a predictor. In an operational trial of snow predictors, Lowndes et al. (1974) suggested that the 
wet-bulb freezing level was the most efficient predictor. To make effective use of wet-bulb freezing level 
or any other predictor relying on information close to the surface in a numerical model requires an 
accurate simulation of the near-surface layers. Before the winter of 1985/86, forecast surface and 
boundary-layer temperatures from the fine-mesh were not accurate enough, but the introduction of a 
scheme for modelling the soil heat flux has improved the forecast temperatures markedly. It is therefore 
worthwhile examining some other predictors to see whether their performance matches that of 
1000-850 P. 


3. A comparison of snow predictors using the fine-mesh model 


The only routine verification of snow forecasts produced by the fine-mesh model is a subjective 
assessment made by the senior forecaster in CFO of the 24-hour forecast . The assessment is based on the 
position of the snow-probability lines over the United Kingdom and on the forecast precipitation area. It 
is made only when the forecast pressure pattern is considered to be good, so that cases of incorrect model 
evolution over the United Kingdom are excluded. The performance of the model during the period 
26 December 1985 to 7 April 1986 is summarized in Table I. 


Table I. CFO subjective assessment of fine-mesh snow forecasts at T+24 hours, during the period 
26 December 1985-7 April 1986 


Score Criteria Number of 
forecasts 


Snow well forecast 28 
Snow slightly over/under estimated 47 
in amount or extent 
Snow badly over/under estimated 29 
in amount or extent 


Considering the 29 forecasts scored as ‘C’, 14 underestimated amounts of snow, 8 overestimated and the 
remaining 7 were a combination of precipitation error and forecast thickness error. The majority of 
forecasts scored as ‘C’ were due mainly to errors in precipitation rather than thickness, and several 
others to the model’s underestimation of areas of very light snow during February. These results show an 
encouraging degree of skill (over 70% score A or B, excluding cases with major evolution errors) in the 
prediction of snow. 

It was not possible to reproduce the above verification using other predictors so an objective 
comparison has been made using 11 cases from the period covered by the above assessment. These 
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1l cases were chosen because on each occasion the forecasting of significant snowfall was finely 
balanced. The predictors examined were as follows: 

(a) 1000-850 P. 

(b) Dry-bulb freezing level. 

(c) Wet-bulb freezing level. 

(d) Mean temperature of the lowest 100 mb above the ground. 

Table II shows the probability of snow for particular values of the first two predictors derived by 
Boyden. The difference between 30% and 70% snow probability using 1000-850 P is less than 10 gpm. 
For this predictor to be useful, the fine-mesh model needs to forecast the 1000-850 mb thickness within 
1% accuracy. The forecast values at T+24 hours of the 1000-850 mb thickness at nine UK/ Irish 
upper-air stations were compared with the actual radiosonde values in the 11 cases assessed. The mean 
error was found to be 1.5 gpm and the root-mean-square (r.m.s.) error 6 gpm — comfortably within the 
range required. In one case thickness errors greater than 10 gpm were found to be due to an inaccurate 
evolution. 


Table II. Snow probabilities derived from values of the 1000-850 mb thickness and dry-bulb freezing 
level 


Percentage probability 70 50 30 
of snow 


1000-850 P (gpm) 1290 1293 1298 
Dry-bulb freezing 25 35 45 
level (mb) 


To use the dry-bulb freezing level as a predictor, the model needs to predict the freezing level to within 
20 mb. Lowndes et al. (1974) derived values for wet-bulb freezing levels for showery and non-showery 
precipitation which show even less margin for error. Comparison of forecast dry- and wet-bulb freezing 
levels at T+24 hours with radiosonde values for the 11 chosen cases gave the following results. For the 
dry-bulb freezing level the mean error was found to be 15 mb and the r.m.s. error 26 mb; for the wet-bulb 
freezing level the mean error was found to be 20 mb and the r.m.s. error 29 mb. The percentage of 
forecasts correct within 20 mb was 72% for the dry-bulb freezing level. These figures demonstrate that 
the freezing level in the model is not sufficiently reliable to use as a predictor for snow. 

The mean temperature of the lowest 100 mb above the ground, (known as M100 henceforth) has 
recently been suggested as a possible snow predictor. However M100 would need to be corrected if the 
fine-mesh orography differed significantly from the actual orography. Suggested values for this 
predictor, derived from comparison with observations by W. Hand (personal communication) are 
shown in Table III. 


Table III. Mean temperature of lowest 100 mb above ground used to predict type of precipitation at 
the surface 


Mean temperature of the Precipitation type 
lowest 100 mb above ground (°C) at surface 


Less than —1.5 Snow 
—1.5 to 0.5 Sleet 
More than 0.5 Rain 
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In tests using M100, the 0.5°C isotherm gave useful guidance for the position of the rain/sleet 
boundary, but —0.5°C seemed more appropriate for the sleet/snow boundary than the —1.5°C 
suggested. This is perhaps evidence of a slightly warm bias in the model at low levels. 


4. The predictors in action — fine-mesh model case study, 7 January 1986 

By 06 GMT on7 January, a warm front was approaching south-west England, bringing moderate or 
heavy rain and sleet to southern Cornwall. As the front continued to push slowly northwards, the rain 
soon turned to snow and sleet inland, especially over the higher ground in southern England and Wales. 
The snow reached the Midlands during the afternoon and extended into East Anglia and north-west 
England during the evening. Over southern England and South Wales the snow did not last long and was 
followed by rain or sleet, but further north the snow persisted. Fig. 2 shows the synoptic situation at 
18 GMT, with the heaviest snow over the Midlands. During the evening, the warm front became 
quasi-stationary from Sussex to central Wales, with moderate or heavy snow to the north of this line. 
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Figure 2. The synoptic situation at 18 GMT on 7 January 1986. 
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The fine-mesh forecast from data time 12 GMT on 6 January predicted these developments and 
enabled forecasters to issue advanced warnings of snow with a high degree of confidence. Fig. 3 shows 
the fine-mesh model’s forecast precipitation area and snow-probability lines for 18 GMT on7 January. 


























Figure 3. Fine-mesh 30-hour forecast of surface pressure and intensity of precipitation for 18 GMT on 7 January 1986. The 
pecked lines are snow probabilities. 


This was a very good 30-hour forecast from the fine-mesh model with an accurately forecast 
precipitation area. The only defect is that the area of heavier precipitation does not extend far enough 
northwards into the Midlands, but the model is only one grid length in error. The evolution is correct 
and errors in the forecast 1000-850 mb thickness at 12 GMT on7 January and 00 GMT on 8 January 
were very small. From Figs 2 and 3 it can be seen that the observed snow area lies between the forecast 
positions of the 20% and 50% snow-probability lines. For the area of heaviest precipitation over the 
Midlands, a correction of | to4 gpm must be subtracted from the 1000-850 P to allow for station height 
above mean sea level. This effectively increases the forecast probability of snow to 50% or more and is 
excellent guidance for the Midlands, received more than 24 hours in advance. Figs 4 and 5 show the 
model’s prediction for the height above mean sea level of the dry- and wet-bulb freezing levels 
respectively. Values greater than 2000 feet have been shaded to indicate areas of low risk of snow. Over 
the Midlands, for example, both the dry- and wet-bulb freezing levels are less than 1000 feet, giving an 
indication of the wintry precipitation expected. The main error is over Sussex and Kent, where forecast 
freezing levels of 1500 to 2000 feet indicate that the model had pushed the warm air slightly too far east. 
The fine-mesh model’s 1.5 m screen temperature forecast for 18 GMT is shown in Fig. 6. Only the 
0-3 °C isotherms have been shown, with the shaded area indicating temperatures greater than 3 °C. 
Forecast temperatures over the Midlands and much of Wales were 0-1 °C; an accurate forecast which 
would have helped to confirm the probability of snow rather than rain. Fig. 7 shows the critical values of 
the M100 snow predictor. The forecast positions of the 0.5, —0.5 and —1.5 °C isotherms are shown, 





Meteorological Magazine, 115, 1986 


N 


Figure 4. Fine-mesh 30-hour forecast of the height of the dry-bulb freezing level for 18 GMT on 7 January 1986. Isopleths are 
labelled in thousands of feet with the shaded area > 2000 feet. 


50° 


Figure 5. As Fig. 4 but for wet-bulb freezing level. 
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Figure 6. Fine-mesh 30-hour forecast of the 1.5 m temperature for 18 GMT on 7 January 1986. Isopleths are degrees Celsius 
and shaded area > 3°C. 


50° 


Figure 7. As Fig. 6 but for mean temperature of the lowest 100 mb. Shaded area is > 0.5 °C. 
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whilst the shaded area indicates a temperature of more than 0.5°C. Bearing in mind that a small 
negative correction must be made to the value over Wales due to inaccurate fine-mesh orography, the 
position of the 0.5 °C isotherm accurately marks the boundary between rain and sleet, whilst the —0.5 
isotherm gives the sleet-snow boundary. The —1.5°C isotherm suggested from comparison with 
observations is too far north in this case. 

This was one of the best fine-mesh snow forecasts of the winter and it gave good guidance of probable 
snow areas 24 hours in advance. The main error was over Sussex and Kent where the model guidance 
was for rain rather than sleet. However, the borderline nature of the weather in this area was indicated by 
the report of moderate to heavy sleet on the south coast with a temperature higher than at Manston in 
Kent where. moderate rain was reported. 


5. Conclusion 


The 1000-850 mb thickness, adjusted for mean-sea-level pressure, appears to be the most useful 
predictor of snow when used with the fine-mesh model. The main advantage over other predictors is in 
the model’s accuracy in forecasting the 1000-850 mb thickness. However, because there is a wide range 
of values over which the transition from rain to snow may occur, much of the success of 1000-850 P 
depends upon the experience of the user. The other predictors examined have a smaller range of values 
over which the transition from rain to snow may occur, but the fine-mesh model is not yet able to 
forecast these predictors as accurately as 1000-850 P. 

Further improvements in the modelling of the boundary layer are envisaged in the near future, and the 
performance of the predictors will be re-examined. On the scale of the fine-mesh, it is unrealistic to 
expect a definitive solution in finely balanced situations. This may be of small comfort to the airfield 
forecaster, forecasting for more than 12 hours ahead, but the advent of higher-resolution mesoscale 
models may improve matters. 
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Report on the sighting of aurora borealis at Royal Air Force Lyneham 


By P.J. Smith 
(Meteorological Office, Royal Air Force Lyneham) 


Auroras are caused by a stream of charged solar particles (the solar wind) being focused by the earth’s 
magnetic field. As the particles enter the high atmosphere they collide with the molecules of the various 
atmospheric gases which then become ‘excited’, i.e. the molecules change their internal energy state. 
When they subsequently decay to their normal energy state they emit packets of energy at visible 
wavelengths, usually red, green or yellow. This release of energy is often organized and the resulting 
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patterns are in the form of rays or curtains of coloured light. A more detailed discussion of auroras can 
be found in Falck-Ytter (1985)*. 

Auroral displays are usually found in the zone between 65 and 70° latitude. Occasionally they form at 
lower latitudes, but it is rare for displays to be clearly visible in southern England. It was therefore with 
great interest that I watched a spectacular auroral display from Lyneham Meteorological Office 
(51° 30’N, 01° 59’W) during the night of 8/9 February 1986. There was little cloud and the visibility was 
good, between 6 and 10 km. I was the only observer, with obligations to Air Traffic Control, so my 
observations of the aurora are necessarily simplified and generalized with only approximate times. Also, 
since the aurora displayed a great variety of activity, | have endeavoured to report only the major 
changes. 

At 2020 GMT a homogeneous arc appeared from the west-north-west to the north-east, faint, white in 
colour and at an elevation of about 5°. By 2035 GMT the display had developed into a rayed arc, 
moderately bright, pale green in colour and with several small rayed bands separate to, but overlapping, 
the base of the arc (Fig. 1(a)). At 2100 GMT it faded to a faint homogeneous arc, north-north-west to 





N 


i tl 
? \ sli Wi cml 


Elevation 5° NE 
Clear sk 
| 8 2 4 


WNW Smail rayed bands (brighter) Horizon 





NW N NE E 


tl (b) 


ull i li , hi Ny, 


Elevation 


WNW 
Horizon 





w Elevation 60° Ee 


(c) 


| , \ 


| o~ 45° 


} I isl M i MA i 


f ‘ 
yall ion 30° iy! . 


é. 


Horizon 











Figure |. Sketches of auroral displays (a) at 2035 GMT, (b) at 2200 GMT and (c) around 2330 GMT. 


north-north-east, and rose to 10° elevation. However, by 2145 GMT the display had developed again 
into a homogeneous band, of moderate brightness, greenish hue, orientated west-north-west to east- 
north-east, with the base having risen slightly to 15—20° elevation. At this time one or two rays began to 





* Falck-Ytter, H.; Aurora. The northern lights in mythology, history and science, Floris Books, Edinburgh, 1985. 
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appear and by 2200 GMT had developed into two prominent rayed areas (Fig. 1(b)). The aurora 
continued to rise, eventually reached 30° elevation and stretched almost from east to west. It continually 
changed form, occasionally breaking into ‘curtain-like’ formations, with frequent single rays reaching 
up to 60° (Fig. 1(c)). The aurora continued in this manner until 2330 GMT. There was also considerable 
meteor activity with one remarkable sighting at 2130 GMT of a very bright green meteor that appeared 
briefly in the west at 30° elevation. 

From 2330 GMT onwards, the aurora varied in activity, sometimes becoming almost homogeneous, 
but remaining as a broad band, moderate in brightness and faintly green. At 0130 GMT it faded to a 
faint homogeneous arc, 30° in elevation, occasionally displaying some rays, but probably partially 
obscured by thickening haze. By 0200 GMT it had disappeared. 


551.571.36(417) 


High absolute humidities in Ireland, 12-13 July 1983 


By S.D. Burt 
(Sandhurst, Berkshire) 


Summary 


An occasion of prolonged high absolute humidities in central and western Ireland during July 1983 is described. It is suggested that 
the event is probably the most extreme of its type on record for the British Isles. 


A recent note in this magazine (Lewis 1986)* drew attention to the occurrence of high absolute 
humidity in parts of England on | July 1968. Readers may be interested in a more recent occasion of 
even higher, and longer-lasting, high absolute humidity — not in England on this occasion, but over 
central and western Ireland on 12-13 July 1983. ; 

The synoptic situation at 1200 GMT on 12 July 1983 was as illustrated in Fig. 1. The British Isles lay 
under the influence of an anticyclone centred between Iceland and Scotland; surface winds were light 
north or north-easterly in most districts. Over all but the extreme north and west of Scotland (where a 
weak cold front had introduced cloud and rain) and a narrow strip down the east and north-east of 
England (where fog and low stratus prevailed) the day was exceptionally warm. Afternoon temperatures 
exceeded 30°C over the whole of England and Wales away from the coast (reaching 32°C in the 
Southampton area and in south Wales) and even 31 °C in southern Scotland. 

Over Ireland the weather was also hot, but in the south, cloudier weather with scattered 
thunderstorms had developed overnight, somewhat in advance of another weak cold front associated 
with a depression to the west of Spain. Humidities were already high, and mist and low cloud developed 
widely as temperatures fell. Even so, night temperatures were uncomfortably high; at Birr the overnight 
minimum was 17.0 °C, and at Roche’s Point 16.9 ° C (see Fig. 2 for locations). However, as the day wore 
on, the mist and low cloud cleared and further breaks in the medium cloud cover, together with a 
westward drift of warm air from England and Wales, allowed temperatures to rise quickly. Meanwhile, 
the cold front advancing slowly north-east continued to spread moist Atlantic air before it over southern 
and central Ireland. As a result central and western districts of Ireland experienced exceptionally high 
absolute humidities for most of the day. 





* Lewis, R.P.W.; An occasion of high absolute humidity in England: | July 1968, Meteorol Mag, 115, 1986, 115-117. 
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Figure 1. 


Surface analysis for 1200 GMT on 12 July 1983. 
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Figure 2. 


Locations of places referred to in the text. Other synoptic stations in Ireland are marked but not individually identified. 


The first report of a dew-point in excess of 20 °C came from Kilkenny at 0900 GMT — 21.3 °C witha 
dry-bulb temperature of 24.7 °C, under clear skies with only 3 knots of wind. By 1200 GMT dew-point 
temperatures at Kilkenny, Birr, Galway, Claremorris, Mullingar and Clones were all above 20 °C; at 
Birr 23.2°C was reported, associated with a dry-bulb temperature of 26.0°C, under 6 oktas 
stratocumulus cover, with mist (visibility 2000 metres) — in conditions of flat calm. At this site the 
dew-point remained above 20 °C for 16 consecutive hours, although it did not subsequently exceed the 
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1200 GMT value. Most other sites, however, reported their highest values of dew-point during the late 
afternoon or early evening. Fig. 3 shows the surface observations for 1800 GMT; by this time the surface 
cold front had been omitted from the Atlantic analysis, although a consideration of the wind and 
dew-point fields would seem to indicate its remains across southern Ireland at about 52.5°N. 
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Figure 3. Surface observations at 1800 GMT on 12 July 1983. 


At 1900 GMT Shannon reported the highest dew-point temperature for the British Isles known to the 
author, namely 23.8°C. At this observation, the dry-bulb temperature was 28.1°C (wet-bulb 
temperature about 25.1 °C, relative humidity about 77%, vapour pressure about 29.5 mb), with 3 oktas 
altocumulus floccus, and a 5-knot north-easterly breeze. The heavy thunderstorm that broke within the 
next hour must have brought welcome relief; at 2000 GMT the temperature was down to 24.0 °C, 
accompanied by an 18-knot easterly breeze. 

Table I lists dry-bulb and dew-point temperatures for eight sites in central and western Ireland for 
24 hours commencing 0600 GMT 12 July, while Fig. 4 presents a sequence of observations made at 
Kilkenny, Shannon and Birr over the same period. 

Temperatures and humidities remained high throughout the night of 12/13 July and for most of the 
following day (although generally not as extreme as on 12 July). At Birr the overnight minimum was 
18.9 °C, with thick fog by morning (Fig. 4), while at 0800 GMT the dew-point had climbed to 21.5 °C, 
and thence to 21.6 °C at 1000 and 1100 GMT. At 1000 GMT a dew-point of 21.8 °C was reported from 
Claremorris. As late as 2000 GMT the dew-point was still as high as 21.1 °C at Kilkenny, while the last 
report of 20 °C or more (20.2 °C) came from Mullingar at 2200 GMT. Not until overnight 13/14 July 
did values of absolute humidity fall below exceptiomal levels, after what was probably a spell of 
unprecedented length. At Birr the dew-point remained at or above 18 °C from 0700 on 12th until 2100 
on 13th inclusive, an unbroken period of 39 hours, including 33 consecutive hours at or above 19 °C and 
17 hours in all above 21°C. Frequency-duration data of dew-point temperatures above specified 
thresholds for eight of the stations identified on Fig. 2 appear in Table II. While this table is probably 
complete for the highest values, the only data available to the author are for the 48 hours ending 
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Tablel. Dry-bulb (Tay) and dew-point (Taew) temperatures (°C) at various sites in Ireland, 12-13 July 
1983 (extracted from synoptic observations as received at the Meteorological Office) 


Time Kilkenny Shannon Galway Birr Dublin Claremorris Mullingar Clones 
GMT 03960 03962 03964 03965 03969 03970 03971 03974 
Ta ry Taw Tary Teew Tay Teow Tary Tew Tay Tew Tary Tease Tay Taw Tar Taew 


0600 18.3 17.7 163 15.2 170 17.0 17.6 16 mS 23 92 WS GT 62 1972 GS 
0700 19.4 186 174 164 176 174 188 18.0 17.6 15.7 19.2 18.1 185 165 18.2 16.7 
0800 20.9 19.1 18.5 17.4 19.0 18.2 20.0 19.2 19.5 16.9 20.0 186 20.1 17.0 20.1 17.3 
0900 24.7 21.3 20.1 17.9 20.3 18.4 217.3 20.3 21.3 17.2 22.0 19.0 23.1 18.2 22.0 17.7 
1000 25.6 21.1 19.9 18.1 22.2 19.6 22.7 21.5 23.0 178 240 19.4 25.1 19.9 242 18.7 


1100 26.0 20.9 206 18.0 240 20.2 24.2 22.1 240 176 254 205 265 19.5 25.2 19.5 
1200 27.4 21.1 226 19.3 248 206 260 23.2 25.9 19.1 26.0 204 27.8 20.3 266 20.6 
1300 28.4 20.4 25.4 20.7 23.5 205 25.2 22.8 25.1 18.5 27.0 205 27.9 19.7 25.3 21.0 
1400 29.0 20.3 26.9 20.7 23.8 19.8 23.5 22.0 245 19.1 27.4 20.8 28.4 20.0 26.0 20.9 
1500 29.1 20.0 28.4 20.7 25.5 20.8 24.7 22.8 23.0 18.6 260 20.5 27.1 21.4 27.5 19.9 


1600 29.6 19.8 29.0 21.6 25.9 20.4 25.8 20.9 240 18.7 27.2 21.5 27.1 21.7 27.8 19.5 
1700 29.5 20.0 29.3 22.7 265 21.0 26.4 21.7 24.2 18.9 27.7 20.6 28.1 21.2 27.6 20.0 
1800 28.7 21.3 29.0 22.1 265 21.3 26.4 22.4 23.8 18.7 27.3 20.2 27.6 20.5 27.3 21.) 
1900 26.7 21.3 28.1 23.8 260 21.3 265 226 229 18.4 266 20.3 266 220 27.4 21.2 
2000 25.2 20.7 24.0 18.1 25.5 21.0 25.5 22.3 22.6 18.2 25.4 20.9 25.1 22.1 26.1 20.6 


2100 24.0 20.8 23.1 17.3 23.5 21.0 249 22.4 224 179 236 20.9 ‘23.7 21.2 24.1 209 
2200 22.5 20.2 21.9 19.0 22.5 20.8 23.7 22.2 21.2 185 22.2 20.4 21.6 20.4 23.1 20.3 
2300 20.6 19.6 21.5 18.9 22.5 20.5 22.4 21.0 206 18.0 20.2 19.2 208 19.7 21.9 19.6 
0000 19.0 18.2 20.7 18.1 22.1 21.3 21.6 20.5 21.2 176 19.5 18.7 204 19.6 20.5 17.8 
0100 18.9 17.8 20.7 18.8 20.2 19.2 20.7 19.7. 19.7 17.6 189 18.3 19.7 19.1 19.2 17.1 


0200 18.8 18.0 20.8 17.9 204 196 21.1 20.1 19.0 17.9 180 174 18.7 18.1 185 16.5 
0300 18.6 18.1 20.0 18.7 20.5 19.9 20.5 19.7 18.7 17.1 17.0 16.7 183 18.0 17.9 16.2 
0400 17.6 17.3 18.9 17.8 200 19.4 20.0 198 185 17.1 166 164 17.8 17.5 17.3 15.6 
0500 17.5 17.0 188 17.7) 19.3 188 19.3 19.1 184 17.22 170 168 17.8 17.5 17.1 15.2 
0600 17.5 16.9 18.7 17.9 185 185 19.5 19.3 19.0 17.6 188 185 18.7 184 174 15.3 


The highest values in each column are printed in bold. Figures in italics denote linear interpolations between observations on 
either side of the missing hour for observations not received. 


Tablell. Frequency-duration of dew-point temperatures above specified thresholds for eight stations 
in Ireland for the 48 hourly observations commencing 0000 GMT on 12 July 1983. The figures in 
parentheses denote the longest continuous spell within that period. 
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Figure 4. Simplified plot of observations made at Kilkenny, Shannon and Birr over the 24 hours commencing 0600 GMT on 
12 July 1983. 


2300 GMT on 13 July. At this time, dew-points at the eight stations ranged between 19.2°C (at 
Mullingar) and 17.1 °C (at Dublin) and accordingly it is certain that consideration of observations for 
14 July would increase some of the spell lengths given in Table II. 

The month of July 1983 provided many occasions, in almost all parts of the country, of steamy heat of 
an intensity almost unknown in the British Isles but, so far as the author is aware, the degree and 
persistence of absolute humidity that prevailed over central and western Irelarid on the 12/13 July was 
not surpassed. If any readers are aware of any other occasions (in July 1983 or otherwise) when 
authenticated dew-points at any station or stations within the British Isles are known to have reached or 
exceeded the values reported in this article, would they please forward details to the author. 


Satellite photograph — 30 September 1986 at 1412 GMT 


The high-resolution NOAA-9 visible image displayed on the Meteorological Office HERMES 
(High-resolution Evaluation of Radiances from MEteorological Satellites) system shows the 
distribution of fog and low cloud over central Britain beneath a pronounced low-level inversion which, 
according to the 1100 GMT radiosonde ascent (see Fig. 1) from Aughton (marked ‘A’ in inset map), is 
300 m above sea level. Following several days of anticyclonic conditions, a weak south-south-westerly 
airflow had become established over the British Isles. Cloud originating over the sea is seen to dissipate 
over the high ground of North Wales and the Isle of Man. However, cloud appears to be largely deflected 
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Figure |. Part of the Aughton radiosonde ascent at 1100 GMT on 30 September 1986. 
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around Cumbria, although tongues of fog do appear to reach the coast, where surface observations 
indicate weak sea-breezes. Over the Isle of Man, Ronaldsway (R) in the south had a sunless day with 
intermittent drizzle and fog, whilst Point of Ayre (P) in the north had a dry day with 9.6 hours of 
sunshine. 

The cloud top shows considerable structure, with lee wave patterns in the north, in particular 
downwind of the Isle of Man where a ‘herring-bone’ pattern is apparent. Over the land, there is evidence 
of banding along the wind direction, particularly within the narrow band of cloud that reaches 
Lancashire (L) via the low-lying Cheshire Plain (C). 


Review 


Oceanic whitecaps and their role in air-sea exchange processes, edited by E.C. Monahan and 
G. MacNiocaill. 168 mm X 247 mm, pp. xii + 294, illus. D. Reidel Publishing Company, Dordrecht, 
1986. Price £40.25, US $64.00, Dfl 145.00. 


This book contains 22 papers presented at the 1983 Galway Whitecap Workshop. Abstracts of 
18 poster papers (in some cases with figures), which were also presented, are included at the end. The 
book is introduced with a short biography of Dr Alfred Woodcock who pioneered the measurement of 
aerosols in the marine boundary layer during investigations, carried out in the 1940s and 1950s, into the 
role played by salt particles in the formation of rain in the tropics. 

The papers cover a wide range of topics within the area of air-sea exchange processes. The oceanic 
whitecaps, which were the main concern of the workshop, are involved in the exchange of gases, 
particulates, and electric charge between the ocean and the atmosphere; papers on all three were 
presented at the workshop, the greatest number being on marine aerosols. The papers on marine 
aerosols include a model of the production of droplets at the sea surface by bubble bursting, and at 
higher wind speeds by the tearing of wave crests. Droplets produced at the sea surface at high wind 
speeds may well be important in the transfer of water to the atmosphere. Other papers are on the 
modelling of aerosols in the marine atmosphere and observations of marine aerosols near the surface 
and from satellites. Two papers (Toba and Koga, Hasse) discuss the relationship between the roughness 
of the sea surface and wave characteristics, a topic of great interest to meteorologists. Whitecaps are of 
interest here since they indicate that wave breaking is occurring. 

The workshop was not concerned only with the atmosphere, and papers on the characteristics of 
waves, whitecaps and bubbles are well represented. The final papers in the volume are concerned with 
satellite sensing of whitecaps, either because of possible effects (through changes to the surface albedo or 
emissivity) of whitecaps on the retrieval of other quantities (e.g. the aerosol content, discussed in another 
paper) or as indicators of the near-surface wind speed. 

The style of the papers is the same as that found in scientific journals, while the quality is generally 
higher than is normally found in the proceedings of conferences or workshops. Although some time has 
elapsed between the workshop and the appearance of this volume, I would agree with the editor’s 
opinion that the papers still provide an up-to-date review of this area. To help make this a valuable 
source book, the editors have also included a large supplementary bibliography of papers which they feel 
are pertinent to the subject but which are not referenced by the other contributors. 

A. Grant 
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